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The  Information  Matrix  in  Latent-Variable  Models 

Abstract 

The  information  matrix  for  the  parameters  in  a  latent- 
variable  model  is  bounded  from  above  by  the  information  that  would 
obtain  if  the  values  of  the  latent  variables  could  also  be 
observed.  The  difference  is  the  "missing  information."  This 
paper  discusses  the  structure  of  the  information  matrix,  and 
characterizes  the  degree  to  which  missing  information  can  be 
recovered  by  exploiting  collateral  variables  for  respondents.  The 
results  are  illustrated  with  data  from  the  Armed  Services 
Vocational  Aptitude  Battery. 


KEY  WORDS:  collateral  information,  item  response  theory,  latent 
variables,  missing  information  principle. 
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1. 


Introduction 


Latent  variable  models  are  used  in  the  social  sciences  to 

provide  parsimonious  descriptions  of  the  associations  among 

observable  variables  in  terms  of  theoretically  derived  constructs. 

Let  x  -  (x. , . . . ,x  )’  denote  observable  variables,  9  denote  latent 
I  n 

variables,  and  denote  parameters  of  the  regressions  of  the  s 

on  9  through  the  known  functions  f.(x.|fl,5, ).  Under  the 

J  J  1 

assumption  of  conditional  independence. 


f(x|5,/3  )  -  n  f  (x  \6.0  ). 
j  J  J 

If  g ( ^  [ ^2 )  *-s  tde  density  function  of  9  in  a  population  of 
interest,  then  the  density  of  the  observed  variables  is  given  by 
the  mixture 


h(x|/3)  -  f  f(x|  9,0^  g(^|^2)  d 9 


where  0  —  (0^,0^).  For  notational  convenience,  we  shall  suppress 
dependence  on  0^  and  0  in  f,  g,  h,  and  elsewhere. 

Some  examples  of  latent-variable  models  follow: 


An  item  response  theory  (IRT)  model  for  an  n-item  educational 
test  gives  the  probability  that  an  examinee  will  respond 


correctly  to  item  j'-x^-l  rather  than  0--as  a  function  of  (i) 


an  unobservable  scalar  9  characterizing  the  proficiency  of 


the  examinee  and  (ii)  a  possibly  vector-valued  parameter  0 


characterizing  the  regression  of  x^.  on  9  (Lord  1980)  .  In 

this  case,  there  is  a  distinct  subparameter  for  each 

item,  so  that  3-> -($,  ,  ,  .  .  .  , 0,  ). 

Ill  In 


In  factor  analysis  models  (Thurstone  1948),  the  "factor 

loadings"  3 ,  .  of  the  observed  variable  x.  are  coefficients  of 
6  lj  J 

its  linear  regression  on  unobservable  trait  values  9.  It  is 
often  assumed  that  g  is  a  standardized  multivariate  normal 
density. 


o  In  latent  class  models  (Lazarsfeld  1950) ,  /3^  implies  response 
probabilities  for  items  from  members  of  classes  1  through  K, 
but  respondents'  class  memberships  9  are  not  observed.  Their 
distribution  g  is  multinomial,  with  parameters  3^. 

This  paper  concerns  the  structure  of  information  matrices 
that  arise  in  the  estimation  of  3 ■  If  values  of  latent  variables 
are  construed  as  missing  data  (as  in  Dempster,  Laird,  and  Rubin 
1977) ,  the  expected  information  matrix  associated  with  estimating 
3  from  values  of  x  is  bounded  from  above  by  the  expected 
information  that  would  obtain  if  values  of  9  were  observed  as  well 
(Orchard  and  Woodbury  1972).  The  difference  is  "missing 
information."  We  aim  to  characterize  this  loss,  and  to 
demonstrate  how  covariates,  or  collateral  variables  y  for 
respondents,  can  be  used  to  recover  some  of  the  missing 
information  for  3-^  ■  The  following  section  gives  background  and 
notation  for  latent  variables  models.  Subsequent  sections  discuss 
expected  information  in  maximum  likelihood  (ML)  estimation  of  3 


from  x,  Chen  in  estimation  from  x  and  y.  Results  are  then 
presented  for  observed  information  matrices.  Finally,  a  numerical 
illustration  with  data  from  the  Profile  of  American  Youth  (U.S. 


Department  of  Defense  1982)  is  given. 

2.  Background  and  Notation 
In  the  terminology  of  Dempster,  Laird,  and  Rubin  (1977), 
estimating  0  from  a  sample  of  N  independent  observations  of  (0,x) 
is  a  "complete-data"  problem.  The  loglikelihood  for  0  is 

N 

-  2  In  [f(x  |*  ,/3  )  g<*  |/i2)!  .  (2.1) 

i-1 

Assume  Chat  both  f  and  g  are  twice-differentiable ,  and  define  the 
gradient  vector  s(0,x)  for  0  for  the  loglikelihood  of  a  single 
observation  by 

s(S ,x)  -  [s1(5 , x) ,s2(0 ,x) ] 

-  [f^  In  f(x|5,^1),  f^ln  g(*|j02)].  (2.2) 

A 

Under  regularity  conditions,  the  MLE  0  solves  the  likelihood 
equation 


N 

0-2  s(0. ,x.) 

.  .  l  i 

l-l 

and  for  large  N  is  approximately  multivariate  normal  in  repeated 
samples,  with  mean  0  and  covariance  given  by  the  inverse  of  the 

3 


>AA>—  A. 


'A] 


i 


expected  information  matrix  N  I  .  Using  the  fact  that  E  (s)-0, 

“X  XB 

we  further  our  purpose  by  writing  I  in  the  following  manner: 

uX 


I«Y  “  //s(0,x)  s'(9,x)  p(&l x,0)  dff  h(x|/3)  dx 


[where  p(fl|x,/3)  is  obtained  by  Bayes  theorem  as  f  (x  |  9  )  g(  9  )/h  (x)  ] 


Ey(E9(ss' |x) 


Varxe(s) 


Let  d  -  Var  (s^)  denote  the  block  of  the  information  matrix 
bx  xb 

that  pertains  to  and  note  that  (2.2)  implies  that  the  off- 
diagonal  block  of  I  for  elements  of  f)  and  those  of  is  zero. 

Suppose  that  collateral  variables  y  such  as  educational  or 
demographic  status  could  also  be  observed  for  respondents,  and  let 
p(y|7)  denote  their  density  in  a  population  of  interest.  In  an 
extension  of  conditional  independence,  it  is  desirable  to  posit 


.iS1>y)  -  f(xU  ,/9p  ; 


(2.3) 


that  is,  9  also  explains  the  associations  among  observed  responses 
x  and  collateral  variables,  (See  Thissen,  Steinberg,  and  Wainer 
1987  on  how  this  assumption  can  be  tested  in  the  context  of  IRT.) 
When  (2.3)  holds,  the  joint  density  of  (x,0, y)  is 


p(x,^  ,y|^1,^2.7)  =  f  (x|  0  g(fl|y,02)  P  (y  1 7f )  - 


(Noce  che  slight  change  in  che  meaning  of  g;  in  a  similar  manner, 
h  becomes  h(x|y,/3)  when  collateral  variables  are  present.)  As 
long  as  0  ,  0^,  and  7  are  distinct,  the  loglikelihood  induced  by 
observing  y  along  with  9  and  x  is  equal  to  (2.1)  plus  a  constant 
insofar  as  0  is  concerned.  Moreover, 


(2.4) 


where 


I 


‘0XY 


We*”1’!' 


Since  off -diagonal  blocks  of  I  for  elements  of  0  with  ! 

0XY  1  j 

those  of  both  0 2  and  7  are  zero,  it  can  be  concluded  that 
observing  y  provides  no  additional  information  about  0 ^  if  both  9 
and  x  are  also  observed. 

I 

3.  Estimating  0  from  x  \ 

Of  course  it  is  never  possible  to  observe  9.  But,  since  9  is 
missing  for  all  respondents  regardless  of  the  values  of  9  and  x, 
it  can  be  considered  missing  data  that  is  "missing  completely  at 

random,"  and  appropriate  likelihood  and  sampling  distribution  ! 

inferences  follow  by  marginalizing  over  9  in  the  complete-data  > 

likelihood  (Little  and  Rubin  1987)  .  It  is  common  practice  to  1 


estimate  /3  from  values  of  x  alone -- ignoring  y  even  if  it  is 
available- -by  maximizing  the  " incomplete  -  data"  logl ike  1 ihcod 


Ax  -  2  In  h(x.J/S)  -  In  [EQ(exp  . *N./3)]  • 

i 

(In  the  context  of  IRT,  Bock  and  Aitkin  1981  refer  to  this 
procedure  as  "marginal  maximum  likelihood"  estimation.)  Again 
under  regularity  conditions,  the  MLE  solves  what  is  now  an 
"  incomplete -data"  likelihood  equation  0=-5A  v/d/3.  Provided 
differentials  can  be  passed  through  the  integral, 

ax  a 

—  -  2  —  In  h(x  |/3) 
d/3  i  30 


-  2  f  —  [f(x  \&,j3  )g(*|j9  )]  h_1(x  |/3)  d* 
i  ap 

a 

-  2  J  —  [In  f(x  \d  ))  f(x  |(3)  g  id)  h'A(x  )  de 

ap 

-  2  f  s(0,xi)  p(0|x. ,£)  dff 
“  2  E0(s|x.)  . 

Accordingly,  the  incomplete -data  information  matrix  is  N  I  where 

A 

XX  "  Ex[EQ(s|x)  E©< s '  I  ]  “  Varx[EQ(s  |x)  ]  , 


the  final  equality  using  E  [E  (s|x)]~0.  I  is  related  to  I 

X  w  A  BX 

through  a  decomposition  of  variance- -an  instance  of  Orchard  and 
Woodbury's  (1972)  "missing  information  principle:" 


:«y  "  VarYtEa(slx) 1  +  ExtVar0(slx> 


xx  +  re|x  ' 


(3.1) 


Igj^,  the  missing  information,  is  the  average  variance  of  the 
complete-data  gradient  vector  given  x  but  not  9;  that  is, 
variation  in  s  over  possible  values  of  9  that  could  give  rise  to 
observed  data  x,  averaged  over  x.  If  the  variance  of  p(0|x,/3) 
were  zero  for  all  x- -loosely  speaking,  if  x  determined  9  with 
complete  accuracy- - then  Var  (s]x)  -  0  for  all  x,  and  no 
information  would  be  lost  as  a  result  of  not  observing  9.  If 
values  of  9  are  not  completely  determined  by  x,  this  variance 
increases  and  information  about  P  is  decreased.  The  proportional 
decrease,  from  diagonal  elements  of  I  to  those  of  I  need  not 
be  the  same  for  all  elements  of  p. 


4.  Estimating  P  from  x  and  y 
When  collateral  variables  y  are  available  for  respondents, 
the  extended  incomplete -data  loglikelihood  is 


A  -  2  In  [ h ( x  | P)  p(y  |7) 


In  [Ee(exp  Aex|xryr. 


•  •  ,xN,yN:/3)  1  +  2  ln  p^b) 


»v»v 


As  long  as  0  is  distinct  from  7,  the  shape  of  the  likelihood 
surface  with  respect  to  0  involves  only  the  first  term- -the 
conditional  distribution  of  the  xs  ,  given  the  observed  values  of 
the  ys .  In  particular,  the  likelihood  equation  for  0  is 
0-5 A  /80,  where 

A  I 

r  1 

-  ”2/  s(S,x  )  p(0|x  ,y  ,0)  d 9 

80  i 

-  S  E0(s|x.>yi)  , 

with  p(fl|x,y,j9)  -  f(x|e,^1)  g(  <9  j  y ,  )  /  h(x|y,/3).  The  asymptotic 

distribution  of  the  MLE  under  repeated  samples  of  N  (x,y)  pairs 

does  involve  the  distribution  of  y,  however.  The  block  of  the 

information  matrix  for  ($,7)  that  pertains  to  0  is  N  Ivv,  where 

A  i 

IXY  ~  EYEx[E9(slx,y)  Ee(s'lx.y)]  “  VarYX[E0(slX’y) ^  ■ 

Large  sample  ML  inferences  about  0  under  repeated  sampling  of 
(x,y)  can  be  based  on  this  block  alone,  since  the  off-diagonal 
block  pertaining  to  the  crossing  of  0  and  7  is  zero.  (Section  5 
concerns  repeated  samples  of  N  (x,y)  values  with  the  ys  fixed  at 
prespecified  values.) 

We  now  focus  on  the  effect  of  including  y  upon  information 

about  0^  .  Expressions  for  the  1^  block  suffice,  since  the  nullity 

of  the  off-diagonal  block  in  I  implies  the  nullity  of  the 

corresponding  block  in  I  (see  Appendix).  Using  (2.4)  and 

A 

applying  the  missing  information  principle, 


^ixy  “  WVaVs 


As  in  (3.1),  Che  missing  information  corresponds  to  a  loss  in  the 
precision  with  which  0^  can  be  estimated.  The  loss  is  expressed 
as  expected  variation  of  s^  over  possible  values  of  the  latent 
variable  9,  conditional  now  on  values  of  y  as  well  as  those  of  x. 

Intuitively,  less  information  should  be  lost  if  y  is  observed 
along  with  x.  An  expression  for  how  much  of  the  missing 
information  has  been  recovered  begins  with  another  decomposition 
of  the  total  variation  in  s^.  Since 


-  VarXY0(s  ) 


it  follows  that 


4  -  V“xlWsl)l  +  VVa,:YlE9<sl>11  +  W'-V*1’! 


T1  _1  _1 
xx  +  xy|x  +  Ie|xY  • 


is  the  variance  in  expected  values  of  s^  over  x,  averaging  over 

y  and  9.  I^i  is  the  expected  variance  of  the  average  values  of 
Y|X 

s  with  respect  to  9  as  y  varies.  It  represents  variation  in 


B 


i 


E  (s^)  explained  by  y  beyond  that  explained  by  x.  iL  is  the 

0  0 1  xy 

expected  variation  in  s ^  remaining  unexplained  after  both  x  and  y 

have  been  accounted  for. 

The  portion  of  missing  information  about  0^  that  is  recovered 

by  using  y,  then,  is  1^.  -  1^  -  1^  |  j,- -  another  application  of  the 

missing  information  principle,  with  (x,y)  treated  as  the  complete 

data  and  x  as  the  incomplete  data.  When  y  and  9  are  independent, 

this  term  is  zero  because  for  each  x,  E  (s^)  takes  the  same  value 

0 

at  all  values  of  y.  No  information  about  0^  is  lost  by  ignoring  y 
in  this  case.  When  y  and  9  are  not  independent,  the  degree  to 
which  information  about  0^  increases  depends  not  simply  upon  the 
strength  of  their  relationship,  but  on  the  strength  of  their 
relationship  conditional  on  x.  There  is  less  to  be  gained  by 
using  collateral  information  when  9  is  already  well  determined  by 
x  alone. 

These  results  indicate  that  greater  benefit  accrues  from 
using  collateral  information  as  it  relates  more  strongly  to  the 
latent  variable,  and  as  less  information  is  available  from  the 
observed  responses  x.  Mislevy's  (1987)  analyses  in  the  context  of 
item  response  theory  indicate  that  in  typical  applications  of 
educational  and  psychological  testing,  readily  available 
collateral  variables  such  as  educational  and  demographic  data  can 
often  account  for  a  third  of  the  population  variance,  and  increase 
the  precision  of  0 ^  roughly  as  much  as  two  to  six  additional  test 
items.  This  gain  is  substantial  in  applications  such  as 
educational  assessment  or  attitudinal  surveys,  where  a  subject 


mighc  be  administered  only  five  or  ten  items;  it  is  potentially 
useful  in  adaptive  testing,  where  he  might  receive  fifteen  well- 
chosen  items.  The  proportional  gain  is  not  impressive  with 
individual  achievement  tests,  where  test  lengths  of  60  to  100 
items  are  common. 

5.  "Conditional  Expected"  Information 
The  preceding  sections  concern  information  matrices  that 
require  marginalization  over  the  sample  spaces  of  both  x  and  y. 
They  reflect  the  point  of  view  one  has  before  observing  either  x 
or  y  values.  This  section  presents  results  for  expected 
information  conditional  on  given  values  of  y. 

Expected  information  conditional  on  y  is  pertinent  to  the 
problem  of  experimental  design,  for  example.  If  it  is  possible  to 
stratify  on  y  when  gathering  data,  expected  information  for 
various  combinations  of  y  values  can  be  compared  to  choose  an 
optimal  sampling  scheme.  This  requires  expectations  over  x 
conditional  on  fixed  values  of  y.  Let  y  -  (y^,...,y  )  be  a  vector 
of  N  specified  values  of  y.  Define  the  mixture  density 

1  N 

P  (*|02)  -  «  2  g(*|yi.02>  , 

y  i 

which  represents  the  marginal  density  of  9  in  samples  drawn  in 
accordance  "with  y.  The  complete-data  expected  information  matrix 


chat  corresponds  Co  chis  density  is  defined  analogously  to  I 

0X 

Section  3,  as 


X0X(y)  "  s^'x^  s'(9,x)  p  (fl|x,0)  dfl  hy(x|/3)  dx  ,  (5 


where 


hy(x|/9)  -  /  f(x|5,^1)  py(0|£2)  d 8 


and 


Vy(9\x,0)  -  f(x|  p  (*|02)  /  h  (x|/9) 


X0X(y)  Xs  C^e  exPected  information  about  0  corresponding  to 


repeated  observations  of  (x,0)  sampled  in  accordance  with  y. 
When,  as  in  practice,  observations  consist  of  (x,y),  one  can 
calculate  expected  information  when  estimation  ignores  y  values 
and  when  estimation  uses  y  values.  These  are,  respectively, 


XX(y)  "  f  s  Py(^  |ac,^)dtf  ]  [fs’p^(9\x,0)dff]  hy(x|/9)  dx 


EX(y)[Var0(y)(slx)] 


(5.2 


and 


-1  N 

N  2  J'[EQ(s|x,yi)  E0(s'jx,yL)]  h(x|yi>/9)  dx 
i 
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By  che  missing  information  principle,  the  gain  in  information 


about  0 
y,  or  I 


1 

1 

Xy 


expected  when  exploiting  y  for  this  particular  value  of 


-  1^  ^  ,  is  at  least  positive  semidef inite  . 


6.  Observed  Information 

The  expected  matrices  discussed  in  the  preceding  sections  are 

functions  of  the  true  values  of  0.  In  practice,  they  ar  sometimes 

/\ 

approximated  by  substituting  maximum  likelihood  estimates  0  for  0. 
The  resulting  "estimated  expected  information  matrices"  are 
consistent  estimates  of  the  desired  values.  They  are  to  be 
distinguished,  however,  from  "observed  information"  matrices. 

An  observed  information  matrix  is  the  negative  inverse  of  the 
second  derivative  matrix  of  the  loglikelihood,  calculated  with 

A 

maximizing  value  0,  the  observed  responses  (x^,...,x^),  and,  if 

required,  (y^ . y  ) .  Observed  information  reflects  the  point  of 

view  after  N  sampled  values  of  both  x  and  y  have  been  observed, 
and  indicates  the  precision  with  which  0  has  been  estimated  from 
the  realized  sample  (Efron  and  Hinkley  1978).  In  a  large-sample 
normal  approximation  to  the  posterior  distribution  of  0  under 
Bayesian  inference,  the  posterior  variance  is  the  negative  inverse 
of  the  observed  information. 

Define  the  complete-data  second  derivative  d(0,x)  as 
ds(9,x)/d0'.  Using  Louis'  (1982)  expressions  for  observed 
information  in  missing  data  problems, 
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,‘LSvl 


l 


E  (/( -d)  p(0|x.,/3)  ae 
i 


Jss  '  p(tf|x  ,0)  dff 


N  I  -  2  [ /( -d)  p(*|x  y  ,/J)  dd  -  Jss'  p(«|x  y  j8)  d 9] 
‘  y  i 


In  contrast  to  results  with  expected  information,  the  off-diagonal 


blocks  of  I  and  I  for  the  crossing  of  5,  and  B.  need  not 
x  xy  6  1 


zero.  Moreover,  the  appealing  decomposition  of  expected 


information  into  variance  components  of  s  does  not  carry  over  to 


observed  information;  it  depended  on  the  fact  that  E(ss' )--E(d) . 


For  unfortuitous  combinations  of  x  and  y  values,  I  can  exceed  I 

x  xy 


in  some  or  even  all  diagonal  elements .  Inasmuch  as  observed 


information  is  quite  generally  a  consistent  estimate  of  expected 


information,  however,  the  results  of  Section  4  suggest  that  one 


would  expect  to  find  the  greater  diagonal  entries  in  I  more 


often  than  not. 


7.  A  Numerical  Illustration 


Tnis  section  illustrates  the  ideas  developed  above  in  the 


context  of  an  item  response  theory  (IRT)  model  for  mental  test 


data.  The  values  of  I?!,,,  1^ ,  and  1^  --the  quantities 

9X(y)  X(y)  Xy 


relevant  to  "conditional  expected  information- -are  approximated 


here  by  evaluating  (5.1)  through  (5.3)  with  in  place  of  0. 


t 


7  . 1  The  Data 

Observed  responses  x  are  vectors  of  responses  to  four  items 
from  the  Arithmetic  Reasoning  test  of  the  Armed  Services 
Vocational  Aptitude  Battery  (ASVAB) ,  Form  8A,  as  observed  in  the 
sample  of  respondents  from  the  survey  Profile  of  American  Youth 
(U . S .  Department  of  Defense  1982)  whose  data  are  reported  by 
Mislevy  (1986).  Response  counts  are  shown  in  Table  1  for  the 
N=776  respondents  as  a  whole,  and  as  broken  down  into  the  four 
categories  of  a  demographic  design  with  subsample  sizes  of 
263,  228,  140,  and  145.  A  correct  response  is  indicated  by  a  1 , 
an  incorrect  response  by  a  0 . 


Table  1  about  here 


7.2  The  Model 

Using  the  numerical  procedure  described  in  Mislevy  (1987), 
the  2-parameter  logistic  (2PL)  IRT  model  was  fit  to  the  response 
counts  in  Table  1  by  maximizing  a  loglikelihood  of  the  form  of 
(2.1).  Under  the  usual  IRT  assumption  of  conditional 
independence,  and  with  0^  representing  the  item  parameters 
(a^ ,  b^ ,  ....  a^ ,  b^) ,  we  have 


>-+ 

f(x|fl,£^)  -  II  exp[  1 . 7ajXj  (0 -bj  )  ]/{ 1+exp  [  1 . 7a^  (0 -b^ 


or,  equivalently, 


x .  1  -  x . 

-  n  p  (*)  J  q. (0)  J  , 

j  J  J 


m 


( 

JS 

1 

M 

m 


where  P.(«)  -  f . (x .-1 1  9 , a . , b . )  and  Q.(0)  -  l-P.(tf).  The 

parameters  a  and  b  give  the  (linear)  regression  of  the  logit  of 

Xj  on  9  .  Normal  densities  were  assumed  for  g(d\y  ,  for  y  - 

1.....4.  The  population  parameters  f}^  -  . ^4,04^  were 

constrained  in  order  to  make  the  model  identified,  by 

incorporating  the  computationally  convenient  constraints 
2  2 

2  -  0  and  2(o^  +  /i^)/4  -  1.  The  resulting  MLE's  are  shown  in 

Table  2. 


Table  2  about  here 


From  this  point  on,  the  MLEs  shown  in  Table  2  will  be  treated 
as  known  true  values  for  the  purpose  of  approximating  expected 
information  matrices.  Using  the  results  of  Section  5,  expected 
information  matrices  will  be  calculated  conditional  on  the 
observed  subsample  proportions  p(y)  -  (.339,  .294,  .180,  .187). 

The  density  p^(0)  that  obtains  after  fixing  y  in  this  manner  is 
thus  a  mixture  of  four  normal  components : 


P v(*|/3 ,)  =  2  g(0|w  ,a  )  p(y). 


Values  of  y  account  for  about  18 -percent  of  the  variation  of  9  in 


the  mixture . 


7.3  Formulae 


Let  u.  be  an  element  of  6. .  Let  W .(6)  take  the  value 
J  1  J 

1.7(0-b.)  if  u.  -  a.,  and  the  value  -1.7a.  if  u.  -  b..  The 

J  J  J  J  J  J 

element  of  the  complete-data  gradient  vector  corresponding  to  u^ 
is 


s(u  ;«,x)  -  [x  -P  (*)]  U  (6) 


Computing  expected  information  requires  the  expected  count  of  each 
response  pattern  -  (x~,...,x^).  In  subpopulation  y,  this 
value  is 


Niy  -  Ny  ;  fcxjit./Sj')  g<*Uy,<y  d^  . 


For  the  undifferentiated  population  (given  the  observed  values  of 

A  A 

y)  as  a  whole,  -  2  N^y.  These  values  are  given  in  Table  3. 

Table  3  about  here 

Expected  information  matrices  are  now  obtained  as  follows: 


i<y>  -  ^  d<  • 


l  l 


where  -  s  and  p  (fijx£)  -  f(x£\d  ,0^  p  (&\P2)  /  h  (x|/3)  ; 
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Ix(y)  "  =  Ni  d^t  W  V'1*^  ^ 


Id  -  ZZ  N  (/  s \  p(«|x  y)  d0][  /  Sj'  p(«|x  y)  d <T 


xy  “  iy  LJ  -i 


where  p(fl|xry)  -  f(xj  8,0^  p(«|^2,y)  /  h(xjy,0). 


7.4  Results 

Tables  4,  5,  and  6  present  iL..  .  ,  I^L.  .  ,  and  1^  .  These 

0X(y)  ’  X(y)  Xy 

matrices  are  block-diagonal,  with  only  off-diagonal  elements  for 
the  a  and  b  parameters  of  a  given  item  taking  possibly  non-zero 
values.  The  proportions  of  effective  information  and  partial 
recovery  for  the  diagonal  elements  are  summarized  in  Table  7 . 
Compared  to  the  information  expected  if  (x,0)  were  observed,  the 
degree  of  information  expected  when  only  x  is  observed  averages 
36-percent  for  a  parameters  and  85-percent  for  b  parameters. 

Using  (x,y)  yields  corresponding  values  of  40-percent  and 
87-percent.  Averaging  over  item- level  results,  the  degree  to 
which  missing  information  is  recoverable  (for  the  observed  values 
of  y)  is  7-percent  for  a's  and  17-percent  for  b's. 


Tables  4-7  about  here 
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Appendix 

Section  4  uses  the  large  -  sample  result  that  if  the  off- 

diagonal  block  in  I  corresponding  to  the  crossing  of  0  and  fi 

OX  1  l 

is  null,  so  is  the  corresponding  block  in  I  .  This  appendix  gives 

A. 

a  heuristic  argument  in  support  of  that  claim.  Since  the  observed 

information  matrix  I  introduced  in  Section  6  is  a  consistent 

x 

estimate  of  1^,,  it  suffices  to  show  that  the  expectation  of  the 
off-diagonal  block  in  I  is  null. 

From  Section  6, 


Ix  “  2  [/(-d)  p(^|xi,^)  dfl  -  Jss'  pC^lx^,^)  d 9 


Substituting  /9  for  /3 ,  and  taking  expectation  over  x  gives 


Ex  h  -  Ex[E9(-d)l  •  VEe(“'> 


<A.l) 


It  is  clear  from  (2.2)  that  the  off-diagonal  block  of  d  is  null 
for  all  x  and  all  9,  so  the  corresponding  block  of  the  first  term 
on  the  right  of  (A.l)  is  null.  The  second  term  is  Var  (s) ,  or 
Iq^,  which,  from  Section  2,  also  has  a  null  off-diagonal  block. 
The  off-diagonal  block  of  the  matrix  difference  between  the  two 
terms  must  be  null  Loo. 
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0 

Total 


ffl 


263  228 


Table  2 


140  145  776 


Maximum  Likelihood  Estimates  of 


Table  3 


Expected  Counts  of  Response  Patterns 


Response 

1  2  3  4  1 


y 


234  Total 


0 

0 

0 

0 

27, 

17 

27 

.85 

31 

.  70 

32 

.  34 

119 
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0 

1 
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.81 

5 

,37 

5 

.  54 
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0 
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0 

7. 

55 

8 

.97 

8 

.46 

8 

.  73 

33. 

.  71 
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0 

1 

1 

3. 

16 

3 

.  56 

2 

.31 

2 

.41 

11, 

,44 

0 

1 

0 

0 
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76 
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.04 
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.  69 

15 

.  13 
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.62 
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1 

0 

1 

4. 

80 
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.49 

3 

.77 

3 

.92 

17  . 

.98 
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1 

1 

0 

7  . 

07 
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.77 

6 

.00 
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1 

1 

6, 

00 

5 
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.  35 

2 

,  46 
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0 

0 
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72 

18 

.75 

16 

,  23 

16 

.  79 
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8. 

38 

8 
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.  31 

27  . 
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1 
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04 

13 

.04 

7 

.68 
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.  77 

1 

0 

1 

1 

14. 

77 

11 

.66 

3 

.80 

3 

.99 

34. 

.22 

1 

1 

0 

0 

17. 

86 

19 

.80 

12 

.37 

12 

.90 

62. 

.93 

1 

1 

0 

1 

19. 

51 

16 

.26 

5 

.76 

6 

.05 

47  . 

.58 

1 

1 

1 
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58 

22 

.79 

8 

.46 

8 

.88 

66. 
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123 
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263. 
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Table  4 

Expected  Complete  -  Data  Information 


al 

bl 

a2 

b2 

a3 

b3 

a4 

b4 

a 

176.75 

290.81 

254.24 

249.38 

b 

-68.60 

371.64 

-37.80 

201.01 

31.42 

250.70 

75.89 

267.98 

Note:  Only  diagonal  blocks  are  shown;  all  other  entries  are  zero. 


Expected  Incomp Lete - Data  Information,  Ignoring  y 


al 

bl 

a2 

b2  a3 

b3 

a4 

b4 

a  51.76 

114.82 

90.72 

95.65 

b  -55.09 

298.63 

-31.91 

179.77  31.70 

215.75 

70.80 

224.58 

Note:  Only 

diagonal 

blocks 

are  shown;  all 

other  entries  are 

zero . 

Table  6 

Expected  Incomplete-Data  Information,  Using  y 


al 

bl 

a2 

b2 

a3 

b3 

a4 

b4 

a 

53.39 

119.79 

105.18 

116.72 

b 

-46.78 

308.91 

-22.49 

183.28 

44.33  2 

22.03 

85.86 

232.98 

Note:  Only  diagonal  blocks  are  shown;  all  other  entries  are  zero. 
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